There are a lot of libraries that we need. We will use:
library(cluster)
library(factoextra)
library(tidymodels)
library(plotly)
library(janitor)
library(GGally)
# We could think about transforming the serving size i.e. turn it into a numeric with g
# but what do we do with the baverages, coffee etc.?
mcd_menu<-mcd_menu%>%
janitor::clean_names()
How are some of our nutritional values related? Items in which category have the most sugar? The most calories? etc.
We want to scale the data so all feautres (nutrional characteristics) are on the same scale i.e. have a mean of 0 and a variance of 1. For this we use across within mutate. Be aware that scale returns a matrix and we only need a numeric vector (as.numeric).
mcd_menu_scaled<-mcd_menu%>%
mutate(across(is.numeric,~as.numeric(scale(.))))
We can use the function fviz_cluster from the package factoextra to produce two diagnostic plots. One with total within sum of squares and one for the silhouette value. Looks like 4 should be a good number of k here.
After performing kmeans with k=4 we want to have a look at the result. Let’s create an interactive plot that allows us to explore the result in detail. Note that we only plot Calories and Sugar here. Other variable combination would be possible. Look at cluster 3. Isn’t it fun that the salads build a cluster on their own?